class: center, middle, inverse, title-slide .title[ # STA 660: Practicum in Data Analysis ] .subtitle[ ## A Sample of Consulting/Research Projects in U.S. Manufacturing, Healthcare, and Transportation Industries ] .author[ ###
Fadel M. Megahed, PhD
Endres Associate Professor
Farmer School of Business
Miami University
@FadelMegahed
fmegahed
fmegahed@miamioh.edu
] .date[ ### Sept. 11, 2023 ] --- # Preface: Why Humans Need Help in Making Decisions? <center> <iframe width="738" height="415" src="https://www.youtube.com/embed/FWSxSQsspiQ" title="The Door Study" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe> </center> .footnote[ <html> <hr> </html> **Source:** Simons, D. J. and Levin, D.T. (1998). “Failure to detect changes to people during a real-world interaction”. *Psychonomic Bulletin & Review*, 5(4), pp. 644-649. Click [here for full text](http://link.springer.com/article/10.3758%2FBF03208840). ] --- # Preface: Data `\(\neq\)` Knowledge .content-box-grey[ .font90[ - **2011:** “**More information ought to be useful, but only if companies can interpret it**. And workers are already overloaded: **62%** of them say that the quality of what they do is **hampered because they cannot make sense of the data they already have …**” [The Economist](https://www.economist.com/business/2011/12/31/too-much-buzz) - **2022:** ``**Eight out of 10 global workers are suffering from information overload**'' + ``Of those surveyed in the U.S., **76%** felt that information overload contributes to their **daily stress**. Another **35%** percent said this overload is having a **detrimental effect on their work performance, with 30% revealing it is affecting their overall job satisfaction** ... [Datanami](https://www.datanami.com/2022/08/18/report-80-of-global-workers-experience-information-overload/) ] ] .content-box-grey[ .font90[ **"Information, no matter how complete and speedy, is not knowledge. Knowledge has temporal spread. Knowledge comes from theory. Without theory, there is no way to use the information that comes to us on the instant."** [Deming (Out of the Crisis, P.106)](https://books.google.co.kr/books?hl=en&lr=&id=RTNwDwAAQBAJ&oi=fnd&pg=PR7&dq=deming+out+of+the+crisis&ots=V1tti3F9L2&sig=c0es629lQeWzx0GGUFjViMDgiZc&redir_esc=y#v=onepage&q=deming%20out%20of%20the%20crisis&f=false) ] ] --- # My Background .font80[ - Application of data-driven decisions `\((D^3)\)` in 3 continents. - **Interests:** Applications in logistics, manufacturing, occupational safety & portfolios. - **Collaborations with:** Aflac, GE Research, Gore, Hilton, IBM Research, & Tennibot ] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#intro_consulting_files/figure-html/my_map-1.gif" alt="My journey with data driven decisions." width="100%" /> <p class="caption">My journey with data-driven decision making.</p> </div> --- class: center, inverse, middle # Our Work in Data Analysis for Advanced Manufacturing # (Industry 4.0/5.0) --- # Background: The 5 Industrial Revolutions? <img src="data:image/png;base64,#figs/operator50_Page_03_Image_0001.png" alt="The five industrial revolutions. From Industry 1.0 to Industry 2.0 (mass production) to Industry 3.0 (automation) to Industry 4.0 (fueled by Big Data) to Industry 5.0 (human machine synbosis)" width="100%" style="display: block; margin: auto;" /> .center[The five industrial revolutions.] .footnote[ <html> <hr> </html> **Image Source:** The image is from Mourtzis, Dimitris, John Angelopoulos, and Nikos Panopoulos. "Operator 5.0: A survey on enabling technologies and a framework for digital manufacturing based on extended reality." *Journal of Machine Engineering* 22 (2022). Click [here](https://yadda.icm.edu.pl/baztech/element/bwmeta1.element.baztech-159441e0-80bf-4fed-8670-f0d61c3439f1) for full article text. Note that Industry 5.0 has been popularized by the EU Commission per [their policy brief](https://op.europa.eu/en/web/eu-law-and-publications/publication-detail/-/publication/38a2fa08-728e-11ec-9136-01aa75ed71a1). ] --- count: false # Background: The 5 Industrial Revolutions? <img src="data:image/png;base64,#figs/operator50_Page_18_Image_0001.png" alt="The three pillars of the fifth industrial revolution. Pillar 1 is being human-centric. Pillar 2 is being sustainable. Pillar is 3 is the reslience of industry 5.0" width="55%" style="display: block; margin: auto;" /> .center[The three pillars of the fifth industrial revolution.] .footnote[ <html> <hr> </html> **Image Source:** The image is from Mourtzis, Dimitris, John Angelopoulos, and Nikos Panopoulos. "Operator 5.0: A survey on enabling technologies and a framework for digital manufacturing based on extended reality." *Journal of Machine Engineering* 22 (2022). Click [here](https://yadda.icm.edu.pl/baztech/element/bwmeta1.element.baztech-159441e0-80bf-4fed-8670-f0d61c3439f1) for full article text. Note that Industry 5.0 has been popularized by the EU Commission per [their policy brief](https://op.europa.eu/en/web/eu-law-and-publications/publication-detail/-/publication/38a2fa08-728e-11ec-9136-01aa75ed71a1). ] --- count: false # Background: The 5 Industrial Revolutions? <img src="data:image/png;base64,#figs/operator50_Page_21_Image_0001.png" alt="A conceptual framework for operator 5.0, highlighting the role of wearable sensors for posture and worker performance analysis" width="65%" style="display: block; margin: auto;" /> .center[Conceptual framework for Operator 5.0 and Cobot in Industry 5.0.] .footnote[ <html> <hr> </html> **Image Source:** The image is from Mourtzis, Dimitris, John Angelopoulos, and Nikos Panopoulos. "Operator 5.0: A survey on enabling technologies and a framework for digital manufacturing based on extended reality." *Journal of Machine Engineering* 22 (2022). Click [here](https://yadda.icm.edu.pl/baztech/element/bwmeta1.element.baztech-159441e0-80bf-4fed-8670-f0d61c3439f1) for full article text. Note that Industry 5.0 has been popularized by the EU Commission per [their policy brief](https://op.europa.eu/en/web/eu-law-and-publications/publication-detail/-/publication/38a2fa08-728e-11ec-9136-01aa75ed71a1). ] --- # An Overview of My Industry 4.0/5.0 Work <img src="data:image/png;base64,#figs/advanced_manufacturing.png" alt="A flow chart of my work in advanced manufacturing, highlighting my two main streams of research" width="100%" style="display: block; margin: auto;" /> --- # Solving the Right Problem <center> <iframe width="738" height="415" src="https://www.youtube.com/embed/aTWQ39yyJa8" title="Image Analysis vs Acoustics" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe> </center> .footnote[ <html> <hr> </html> **The iMAS Lab:** As a researcher in the iMAS Lab at Virginia Tech, I have learned the most important rule in business consulting: your role is to **add value by solving the right problem (from acoustics to image processing)**. ] --- ## Variation Visualization+Understanding for SUV Assembly .content-box-grey[ .font80[ .center[.black[.bold[An Industry 4.0 problem involving CMM data and assembly trouble that was not diagnosed]]] - A **major US automotive plant** provided us with a dataset of 1,000 vehicles, which had these characteristics*: + 100% sampling was performed at 27 locations using CMMs + 39 measurements (deviations from nominal) per vehicle ] ] <img src="data:image/png;base64,#figs/suv.jpg" alt="A 2D CAD of the SUV panel with the 27 locations and 39 measurement points" width="80%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> * Due to proprietary issues, the original data values, such as the number and location of sample points (black spheres), and the sample data have been slightly modified. ] --- # Our Solution for the Industry 4.0 Problem <center> <iframe width="738" height="415" src="https://www.youtube.com/embed/Tq8d9aj8UbI" title="Our Visualization of the Match-Boxing Assembly Error" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe> </center> .footnote[ <html> <hr> </html> **Source:** Wells, L. J., **Megahed, F. M.**, Camelio, J. A., & Woodall, W. H. (2012). A framework for variation visualization and understanding in complex manufacturing systems. *Journal of Intelligent Manufacturing*, 23, 2025-2036. Click [here for full text](https://link.springer.com/content/pdf/10.1007/s10845-011-0529-1.pdf). ] --- # Worker Performance Monitoring: Motivation <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#figs/box_score.PNG" alt="Did Auburn fatigue and lose a national championship?" width="93%" /> <p class="caption">Did Auburn fatigue and lose a national championship?</p> </div> .footnote[ <html> <hr> </html> **Source:** The box score for the BCS Championship game between Auburn and FSU, available from <http://www.sports-reference.com/cfb/boxscores/2014-01-06-auburn.html>. ] --- count: false # Worker Performance Monitoring: Motivation <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#figs/espn.PNG" alt="Did Auburn fatigue and lose a national championship?" width="47%" /> <p class="caption">Did Auburn fatigue and lose a national championship?</p> </div> .footnote[ <html> <hr> </html> **Source:** The ESPN article that started our research collaboration: <http://www.espn.com/college-football/story/_/id/11121315/florida-state-seminoles-coach-jimbo-fisher-use-gps-technology-win-national-championship>. ] --- count:false # Worker Performance Monitoring: Motivation <img src="data:image/png;base64,#figs/catapult.PNG" alt="The use of wearable sensors for physical fatigue management is widespread in athletics" width="75%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> **Figure Source:** <https://www.catapultsports.com/sports>. ] --- ## Worker Performance Monitoring: Industry vs Athletics .black[.bold[The use of wearables for fatigue management is easier in sports than in more traditional occupational settings!!!]] .pull-left[ <img src="data:image/png;base64,#figs/fitt_manufacturing.png" alt="The application of the FITT principle to industry" width="95%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="data:image/png;base64,#figs/fitt_sports.png" alt="The application of the FITT principle to sports" width="95%" style="display: block; margin: auto;" /> ] --- # Gait Monitoring for Industry 5.0 Operations .content-box-grey[ .center[.black[.bold[Industry-Driven Research Questions]]] .font80[ 1. How do important gait parameters (e.g., stride length, height and duration) change over time? 2. How do these sensor-based changes relate to the participants’ subjective fatigue ratings? 3. Are there consistent patterns in performance across different individuals over time? ] ] .pull-left[ <img src="data:image/png;base64,#figs/jqt/ankle.jpeg" alt="The attachment of the Shimmer IMU sensor on a participant's right ankle" width="52%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="data:image/png;base64,#figs/jqt/path.png" alt="A CAD representation of the pathway used by the participants during the material handling experiment" width="62%" style="display: block; margin: auto;" /> ] .footnote[ <html> <hr> </html> **Source:** Baghdadi, A., Cavuoto, L. A., Jones-Farmer, A., Rigdon, S. E., Esfahani, E. T., & **Megahed, F. M.** (2021). Monitoring worker fatigue using wearable devices: A case study to detect changes in gait parameters. *Journal of Quality Technology*, 53(1), 47-71. Paper can be downloaded from [this link](https://par.nsf.gov/servlets/purl/10218083). ] --- count:false # Gait Monitoring for Industry 5.0 Operations <img src="data:image/png;base64,#figs/jqt/IMUpreprocess.jpg" alt="The four steps of IMU preprocessing: rotational angle estimation for foot angle estimation, stride segmentation where we look for two humps associated with the gait cycle, translational velocity estimation for stride duration computations, and foot trajectory estimation for stride height and length estimation" width="42%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> **Source:** Baghdadi, A., Cavuoto, L. A., Jones-Farmer, A., Rigdon, S. E., Esfahani, E. T., & **Megahed, F. M.** (2021). Monitoring worker fatigue using wearable devices: A case study to detect changes in gait parameters. *Journal of Quality Technology*, 53(1), 47-71. Paper can be downloaded from [this link](https://par.nsf.gov/servlets/purl/10218083). ] --- count:false # Gait Monitoring for Industry 5.0 Operations <center> <video width="63%" height="63%" autoplay controls aria-label="A video showing that the scaled stride height, length and duration for all 15 participants show a significant amount of noise, which needs to be filtered out prior to any meaningful processing."> <source src="data:image/png;base64,#figs/LineGraph.mp4" type="video/mp4"> </video> </center> .footnote[ <html> <hr> </html> **Source:** Baghdadi, A., Cavuoto, L. A., Jones-Farmer, A., Rigdon, S. E., Esfahani, E. T., & **Megahed, F. M.** (2021). Monitoring worker fatigue using wearable devices: A case study to detect changes in gait parameters. *Journal of Quality Technology*, 53(1), 47-71. Paper can be downloaded from [this link](https://par.nsf.gov/servlets/purl/10218083). ] --- count:false # Gait Monitoring for Industry 5.0 Operations <center> <video width="63%" height="63%" autoplay controls aria-label="A video showing that the use of median filtering, with a rolling window, has significantly smoothed the scaled stride height, length and duration for all 15 participants."> <source src="data:image/png;base64,#figs/MedianFiltering.mp4" type="video/mp4"> </video> </center> .footnote[ <html> <hr> </html> **Source:** Baghdadi, A., Cavuoto, L. A., Jones-Farmer, A., Rigdon, S. E., Esfahani, E. T., & **Megahed, F. M.** (2021). Monitoring worker fatigue using wearable devices: A case study to detect changes in gait parameters. *Journal of Quality Technology*, 53(1), 47-71. Paper can be downloaded from [this link](https://par.nsf.gov/servlets/purl/10218083). ] --- count:false # Gait Monitoring for Industry 5.0 Operations <center> <video width="63%" height="63%" autoplay controls aria-label="A video showing that the use of the multivariate changepoint models and the cumulative sum of each signal can detect distinct changepoints that are associated with fatigue development in each participant."> <source src="data:image/png;base64,#figs/RPE_Analysis.mp4" type="video/mp4"> </video> </center> .footnote[ <html> <hr> </html> **Source:** Baghdadi, A., Cavuoto, L. A., Jones-Farmer, A., Rigdon, S. E., Esfahani, E. T., & **Megahed, F. M.** (2021). Monitoring worker fatigue using wearable devices: A case study to detect changes in gait parameters. *Journal of Quality Technology*, 53(1), 47-71. Paper can be downloaded from [this link](https://par.nsf.gov/servlets/purl/10218083). ] --- count:false # Gait Monitoring for Industry 5.0 Operations .pull-left[ .font90[ .center[.bold[From the cluster visualization (right), we observe the followig patterns:]] - A large variation in fatigue development patterns across participants. - There are varying patterns of fatigue development across participants. + **Clusters 1 and 2:** Participants generally take **shorter, lower and slower steps when they are fatigued**. + **Cluster 3** An opposite effect is observed, where participants tend to take **longer, higher but faster strides**. ] ] .pull-right[ <img src="data:image/png;base64,#figs/jqt/dtwClustersCUSUM.png" alt="A depiction of the median curve for each cluster in a panel plot. The top panel shows 4 curves, one for each cluster, for the stride length. The middle panel shows stride height and the lower panel shows stride duration. The explanation of the main trends for the three of the four clusters is in the text for the slide. I ignored explaining cluster four since it contained only one participant. Refer to the paper for more details." width="76%" style="display: block; margin: auto;" /> ] .footnote[ <html> <hr> </html> **Source:** Baghdadi, A., Cavuoto, L. A., Jones-Farmer, A., Rigdon, S. E., Esfahani, E. T., & **Megahed, F. M.** (2021). Monitoring worker fatigue using wearable devices: A case study to detect changes in gait parameters. *Journal of Quality Technology*, 53(1), 47-71. Paper can be downloaded from [this link](https://par.nsf.gov/servlets/purl/10218083). ] --- ## End-to-End Ergonomic Risk Assessment for Operator 5.0 .content-box-grey[ .center[.bold[Research Questions]] .font80[ 1. How can we automatically divide the time series of acceleration signals, obtained from a wrist-worn accelerometer, into different segments? 2. Within each segment, can we determine the correct task label for the segment? 3. How accurately can we determine the total duration of each identified/performed task? 4. How accurately can we determine the number of repetitions for repetitive tasks, such as the number of lifts or the number of hand pulls in a hoisting task. 5. How can we fuse the physical ergonomic risk factors (i.e., task type, duration, repetition) from the sensor with load and task-design information to systematically estimate the probability of a workplace injury? ] ] .footnote[ <html> <hr> </html> **Source:** Lamooki, S. R., Hajifar, S., Kang, J., Sun, H., **Megahed, F. M.**, & Cavuoto, L. A. (2022). A data analytic end-to-end framework for the automated quantification of ergonomic risk factors across multiple tasks using a single wearable sensor. *Applied Ergonomics*, 102, 103732. Paper can be downloaded from [this link](https://www.sciencedirect.com/science/article/pii/S0003687022000552). ] --- count: false ## End-to-End Ergonomic Risk Assessment for Operator 5.0 <img src="data:image/png;base64,#figs/ergo_risk_assessment_conceptual.jpg" alt="Here, we present an overview of our proposed end-to-end framework for ergonomic risk assessment using a single accelerometer. The framework extends the MSAAI framework of (Lim and D'Souza, 2020) by: (a) explicitly highlighting the importance of clearly defining the ergonomic problem of interest to ensure that the goals and requirements are distinctly communicated; (b) dividing the modeling stage into planning (in our first and second phases) and feature engineering (in our third phase); and (c) expanding on the steps in the analysis phase to include change point detection, task identification (activity recognition), and estimation of task duration/repetition count." width="70%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> **Source:** Lamooki, S. R., Hajifar, S., Kang, J., Sun, H., **Megahed, F. M.**, & Cavuoto, L. A. (2022). A data analytic end-to-end framework for the automated quantification of ergonomic risk factors across multiple tasks using a single wearable sensor. *Applied Ergonomics*, 102, 103732. Paper can be downloaded from [this link](https://www.sciencedirect.com/science/article/pii/S0003687022000552). ] --- count: false ## End-to-End Ergonomic Risk Assessment for Operator 5.0 <img src="data:image/png;base64,#figs/ergo_risk_assessment.jpg" alt="The details of the ergonomic risk assessment framework." width="41%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> **Source:** Lamooki, S. R., Hajifar, S., Kang, J., Sun, H., **Megahed, F. M.**, & Cavuoto, L. A. (2022). A data analytic end-to-end framework for the automated quantification of ergonomic risk factors across multiple tasks using a single wearable sensor. *Applied Ergonomics*, 102, 103732. Paper can be downloaded from [this link](https://www.sciencedirect.com/science/article/pii/S0003687022000552). ] --- # Our Future Directions in Operator 5.0 Research <img src="data:image/png;base64,#figs/niosh.png" alt="We are currently examining whether system reliability models can be used to model the repeated cycles of fatigue and recovery in human workers. This is a joint 3-year project with the University at Buffalo and funded by the National Institute for Occupational Safety and Health." width="86%" style="display: block; margin: auto;" /> --- class: center, middle, inverse # Our Work in Healthcare and Public Health --- # An Overview of My Health Analytics Work <img src="data:image/png;base64,#figs/health.png" alt="A flow chart of my work in health analytics, highlighting my two main streams of research" width="100%" style="display: block; margin: auto;" /> --- # Improving Heart Transplant Prediction ### United Network for Organ Sharing (UNOS) - The United Network for Organ Sharing (UNOS) is a nonprofit association that manages the U.S. organ transplantation system and database. - Our UNOS dataset contained 129 predictors (43 numeric and 86 categorical) and 45,005 observations. + The imbalance ratio in the response variable is ~ 6.29 (where 1-year graft survival = 38,829 & failure = 6,176 cases) .footnote[ <html> <hr> </html> **Source:** Dolatsara, H. A., Chen, Y. J., Evans, C., Gupta, A., & **Megahed, F. M.** (2020). A two-stage machine learning framework to predict heart transplantation survival probabilities over time with a monotonic probability constraint. *Decision Support Systems*, 137, 113363. Paper can be downloaded from [this link](https://www.sciencedirect.com/science/article/pii/S0167923620301184). ] --- count: false # Improving Heart Transplant Prediction .pull-left[ - Develop **independent ML models** for each time-period (e.g., see [Yoon 2018](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0194985) ) - Utilize a **sequential prediction** approach where the “survival” probability from time periods `\(1, \dots, t\)` are inputs to predict the survival at time `\(t+1\)` ([Ohno-Machdo 1997](https://www.sciencedirect.com/science/article/pii/S0010482597000085)) - Predict the outcome probability for one time-period and use the **population’s survival outcomes to calibrate** other periods’ probabilities ([Medved 2018](https://www.nature.com/articles/s41598-018-21417-7)) ] .pull-right[ .center[**What we really want**] <img src="data:image/png;base64,#figs/isotonic.gif" alt="Proposed use of isotonic regression to tune survival probabilities" width="100%" style="display: block; margin: auto;" /> Achievable by applying isotonic regression to the outputs of the different ML models. ] .footnote[ <html> <hr> </html> **Source:** Dolatsara, H. A., Chen, Y. J., Evans, C., Gupta, A., & **Megahed, F. M.** (2020). A two-stage machine learning framework to predict heart transplantation survival probabilities over time with a monotonic probability constraint. *Decision Support Systems*, 137, 113363. Paper can be downloaded from [this link](https://www.sciencedirect.com/science/article/pii/S0167923620301184). ] --- count: false # Improving Heart Transplant Prediction <img src="data:image/png;base64,#https://ars.els-cdn.com/content/image/1-s2.0-S0167923620301184-gr3_lrg.jpg" alt="Adopting the proposed two-stage framework to obtain monotonically decreasing heart transplantation survival probabilities over time." width="80%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> **Source:** Dolatsara, H. A., Chen, Y. J., Evans, C., Gupta, A., & **Megahed, F. M.** (2020). A two-stage machine learning framework to predict heart transplantation survival probabilities over time with a monotonic probability constraint. *Decision Support Systems*, 137, 113363. Paper can be downloaded from [this link](https://www.sciencedirect.com/science/article/pii/S0167923620301184). ] --- count: false # Improving Heart Transplant Prediction .panelset[ .panel[.panel-name[**Sample Patient Predictions**] <img src="data:image/png;base64,#https://ars.els-cdn.com/content/image/1-s2.0-S0167923620301184-gr6_lrg.jpg" alt="Plots of the forecast probabilities for four sample patients at years 1 − 10 post transplant. The light and dark blue lines show the probabilities before and after the application of isotonic regression, respectively. Note when the dark blue line is not shown, this equates to a perfect overlap with the light blue line." width="35%" style="display: block; margin: auto;" /> ] .panel[.panel-name[**Unexpected Improvements in Predictive Metrics**] .center[**Differences (after isotonic regression – before isotonic regression) in the holdout performance**] .font75[ <html> <table> <thead> <tr> <th></th> <th><span style="font-weight:700">Month 1</span></th> <th><span style="font-weight:700">Year 1</span></th> <th><span style="font-weight:700">Year 2</span></th> <th><span style="font-weight:700">Year 3</span></th> <th><span style="font-weight:700">Year 4</span></th> <th><span style="font-weight:700">Year 5</span></th> <th><span style="font-weight:700">Year 6</span></th> <th><span style="font-weight:700">Year 7</span></th> <th><span style="font-weight:700">Year 8</span></th> <th><span style="font-weight:700">Year 9</span></th> <th><span style="font-weight:700">Year 10</span></th> </tr> </thead> <tbody> <tr> <td><span style="font-weight:bolder">Δ AUC</span></td> <td><span style="font-weight:bolder">0.008</span></td> <td><span style="font-weight:bolder">0.017</span></td> <td><span style="font-weight:bolder">0.029</span></td> <td><span style="font-weight:bolder">0.021</span></td> <td><span style="font-weight:bolder">0.011</span></td> <td><span style="font-weight:bolder">0.007</span></td> <td><span style="font-weight:bolder">0.001</span></td> <td><span style="font-weight:bolder">0.001</span></td> <td><span style="font-weight:bolder">0.000</span></td> <td><span style="font-weight:bolder">0.002</span></td> <td><span style="font-weight:bolder">0.001</span></td> </tr> <tr> <td><span style="font-weight:bolder">Δ Accuracy</span></td> <td><span style="font-weight:bolder">0.125</span></td> <td><span style="font-weight:bolder">0.083</span></td> <td><span style="font-weight:bolder">0.061</span></td> <td><span style="font-weight:bolder">0.042</span></td> <td><span style="font-weight:bolder">0.017</span></td> <td><span style="font-weight:bolder">0.007</span></td> <td>−0.005</td> <td>−0.008</td> <td>−0.006</td> <td><span style="font-weight:bolder">0.001</span></td> <td><span style="font-weight:bolder">0.008</span></td> </tr> <tr> <td><span style="font-weight:bolder">Δ Sensitivity</span></td> <td><span style="font-weight:bolder">0.150</span></td> <td><span style="font-weight:bolder">0.136</span></td> <td><span style="font-weight:bolder">0.110</span></td> <td><span style="font-weight:bolder">0.083</span></td> <td><span style="font-weight:bolder">0.032</span></td> <td><span style="font-weight:bolder">0.007</span></td> <td>−0.037</td> <td>−0.061</td> <td>−0.102</td> <td>−0.130</td> <td>−0.161</td> </tr> <tr> <td><span style="font-weight:bolder">Δ Specificity</span></td> <td>−0.114</td> <td>−0.105</td> <td>−0.064</td> <td>−0.047</td> <td>−0.009</td> <td><span style="font-weight:bolder">0.008</span></td> <td><span style="font-weight:bolder">0.034</span></td> <td><span style="font-weight:bolder">0.050</span></td> <td><span style="font-weight:bolder">0.087</span></td> <td><span style="font-weight:bolder">0.112</span></td> <td><span style="font-weight:bolder">0.134</span></td> </tr> <tr> <td><span style="font-weight:bolder">Δ G-Mean</span></td> <td><span style="font-weight:bolder">0.009</span></td> <td><span style="font-weight:bolder">0.010</span></td> <td><span style="font-weight:bolder">0.019</span></td> <td><span style="font-weight:bolder">0.016</span></td> <td><span style="font-weight:bolder">0.010</span></td> <td><span style="font-weight:bolder">0.007</span></td> <td><span style="font-weight:bolder">0.000</span></td> <td>−0.003</td> <td>−0.003</td> <td>−0.001</td> <td>−0.005</td> </tr> </tbody> </table> </html> ] ] ] .footnote[ <html> <hr> </html> **Source:** Dolatsara, H. A., Chen, Y. J., Evans, C., Gupta, A., & **Megahed, F. M.** (2020). A two-stage machine learning framework to predict heart transplantation survival probabilities over time with a monotonic probability constraint. *Decision Support Systems*, 137, 113363. Paper can be downloaded from [this link](https://www.sciencedirect.com/science/article/pii/S0167923620301184). ] --- # COVID-19 Deaths in 3,108 U.S. Counties .pull-right-2[ .content-box-grey[ .center[.black[.bold[Motivation]]] .font80[ - "Across the world, public health data are gathered at a very local level before aggregation into regional and national figures.... While useful as a summary, local distinctions get lost, painting a misleading image of whole countries being affected uniformly.” [Financial Times](https://ig.ft.com/coronavirus-global-data/) - Differing national and regional patterns within the United States was illustrated in the chart. ] ] ] .pull-left-2[ <img src="data:image/png;base64,#figs/covid_deaths/us_vs_counties.png" alt="the national 7-day moving average of deaths as well as the various patterns that arise among 8 example counties from Sunday, March 1, 2020, to Saturday, February 27, 2021. For example, New York, NY, experienced a large first wave of deaths, followed by a relatively low death count through the remainder of the study. Nearby Ocean County, NJ, a populous county near the New Jersey shore had a large first wave of deaths, followed by a second wave beginning in late 2020. In contrast, Butler County, OH, a populous midwestern county, showed low death counts until late in the study period. None of these patterns mimics the overall pattern for the aggregate death counts in the United States." width="100%" style="display: block; margin: auto;" /> ] .footnote[ <html> <hr> </html> **Source:** **Megahed, F. M.**, Jones-Farmer, L. A., Ma, Y., & Rigdon, S. E. (2022). Explaining the Varying Patterns of COVID-19 Deaths Across the United States: 2-Stage Time Series Clustering Framework. *JMIR Public Health and Surveillance*, 8(7), e32164. Paper can be downloaded from [this link](https://publichealth.jmir.org/2022/7/e32164). ] --- count: false # COVID-19 Deaths in 3,108 U.S. Counties .pull-right-2[ .content-box-grey[ .center[.black[.bold[Research Questions]]] .font80[ - How many distinct clusters of U.S. counties exhibit similar time series patterns in the deaths due to COVID-19? - How are these clusters geographically distributed across the United States? - Are certain geographic, political, government, and social vulnerability variables associated with the patterns of COVID-19 related deaths? ] ] ] .pull-left-2[ <img src="data:image/png;base64,#figs/covid_deaths/us_vs_counties.png" alt="the national 7-day moving average of deaths as well as the various patterns that arise among 8 example counties from Sunday, March 1, 2020, to Saturday, February 27, 2021. For example, New York, NY, experienced a large first wave of deaths, followed by a relatively low death count through the remainder of the study. Nearby Ocean County, NJ, a populous county near the New Jersey shore had a large first wave of deaths, followed by a second wave beginning in late 2020. In contrast, Butler County, OH, a populous midwestern county, showed low death counts until late in the study period. None of these patterns mimics the overall pattern for the aggregate death counts in the United States." width="100%" style="display: block; margin: auto;" /> ] .footnote[ <html> <hr> </html> **Source:** **Megahed, F. M.**, Jones-Farmer, L. A., Ma, Y., & Rigdon, S. E. (2022). Explaining the Varying Patterns of COVID-19 Deaths Across the United States: 2-Stage Time Series Clustering Framework. *JMIR Public Health and Surveillance*, 8(7), e32164. Paper can be downloaded from [this link](https://publichealth.jmir.org/2022/7/e32164). ] --- count: false # COVID-19 Deaths in 3,108 U.S. Counties
.footnote[ <html> <hr> </html> **Source:** **Megahed, F. M.**, Jones-Farmer, L. A., Ma, Y., & Rigdon, S. E. (2022). Explaining the Varying Patterns of COVID-19 Deaths Across the United States: 2-Stage Time Series Clustering Framework. *JMIR Public Health and Surveillance*, 8(7), e32164. Paper can be downloaded from [this link](https://publichealth.jmir.org/2022/7/e32164). ] --- count: false # COVID-19 Deaths in 3,108 U.S. Counties .pull-left[ - Using a multinomial regression model, we were able to: + Identify some of the **explanatory factors** that can be used to explain a **county's assignment to a given cluster**. + Our **overall accuracy was about 61.25%**, which is statistically significant when compared to the baseline of 25% (random prediction of the four classes). ] .pull-right[ <img src="data:image/png;base64,#figs/cluster_explanations.png" alt="A table of the multinomial regression coefficients used to explain the differences in cluster membership." width="70%" style="display: block; margin: auto;" /> ] --- # Future Directions: Data Quality in Public Health <center> <video width="72%" height="72%" autoplay controls aria-label="A video showing there are some structural missing behavior in U.S. public health data (with the case of COVID-19 vaccinations). In our current work, we are examining how statistical and mathematical models can be combined to address this issue."> <source src="data:image/png;base64,#figs/animated_vaccine_map.mp4" type="video/mp4"> </video> </center> --- class: center, middle, inverse # Our Work in Transportation Analytics --- # An Overview of My Transportation Portfolio <img src="data:image/png;base64,#figs/logistics.png" alt="A flow chart of my work in health analytics, highlighting my two main streams of research" width="100%" style="display: block; margin: auto;" /> --- # Seat Assignment with Social Distancing .content-box-grey[ .center[.bold[Motivation]] .font90[ - In the U.S., school districts created strict layouts for student bus seating depending on the **bus configuration** and **social distancing** requirements (6ft `\(\rightarrow\)` 3.3 ft). See [KY Guidelines Archived](https://archive.org/details/ERIC_ED611894/page/6/mode/2up). - However, these guidelines did not account for the fact that often students from the same household ride together in the same school bus. ] ] <img src="data:image/png;base64,#figs/moore1.png" width="80%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> **Source:** Moore, J. F., Carvalho, A., Davis, G. A., Abulhassan, Y., & **Megahed, F. M.** (2021). Seat assignments with physical distancing in single-destination public transit settings. *IEEE Access*, 9, 42985-42993. Paper can be downloaded from [this link](https://ieeexplore.ieee.org/abstract/document/9374410). ] --- count: false # Seat Assignment with Social Distancing .center[Experimental results for school buses and a 6-ft minimum physical distancing requirement.] | Bus Capacity in Persons (P) | KY Guidelines per [link](https://archive.org/details/ERIC_ED611894/page/6/mode/2up) in P (% occupancy) | MIP Solution in P <html><br></html> (% occupancy) | Heuristic Solution in P <html><br></html> (% occupancy) | | :---: | :---: | :---: | :---: | | .bold[34P] | 4P (11.8%) | 4P (11.8%) | 5P (14.7%) | | .bold[52P] | 6P (11.5%) | 6P (11.5%) | 8P (15.4%) | | .bold[66P] | 7P (10.6%) | 8P (12.1%) | 10P (15.2%) | | .bold[72P] | 8P (11.1%) | 8P (11.1%) | 11P (15.3%) | | .bold[84P] | 9P (10.7%) | 10P (11.9%) | 13P (15.5%) | .footnote[ <html> <hr> </html> **Source:** Moore, J. F., Carvalho, A., Davis, G. A., Abulhassan, Y., & **Megahed, F. M.** (2021). Seat assignments with physical distancing in single-destination public transit settings. *IEEE Access*, 9, 42985-42993. Paper can be downloaded from [this link](https://ieeexplore.ieee.org/abstract/document/9374410). ] --- count: false # Seat Assignment with Social Distancing <img src="data:image/png;base64,#figs/seat_planner.PNG" width="90%" style="display: block; margin: auto;" /> .center[.font80[Click [here to open the app](http://seatplanner.fsb.miamioh.edu/) and interact with it (e.g., pick **household** from the **dropdown next to optimization type**)]] .footnote[ <html> <hr> </html> **Source:** Moore, J. F., Carvalho, A., Davis, G. A., Abulhassan, Y., & **Megahed, F. M.** (2021). Seat assignments with physical distancing in single-destination public transit settings. *IEEE Access*, 9, 42985-42993. Paper can be downloaded from [this link](https://ieeexplore.ieee.org/abstract/document/9374410). ] --- count: false # Safety-Driven Truck Routing .content-box-grey[ .center[.bold[Motivating Questions]] .font80[ - Can we predict the occurrence of safety critical events in the next 30-minutes of driving? - Can we develop a truck routing routine that minimizes the occurrence of safety critical events by suggesting non-shortest distance routes (while not impacting on-time delivery)? ] ] .pull-left[ <img src="data:image/png;base64,#figs/FullRegion.png" alt="Full region of recorded pings" width="45%" style="display: block; margin: auto;" /> .center[.font80[Full region of recorded pings]] ] .pull-right[ <img src="data:image/png;base64,#figs/ImportantNodes.png" alt="Important nodes of the road network" width="45%" style="display: block; margin: auto;" /> .center[.font80[Important junctions (nodes) ]] ] .footnote[ <html> <hr> </html> **Source:** On-going joint work with [Qiong Hu](https://business.ucdenver.edu/about/our-people/qiong-hu), [Alex Vinel](https://eng.auburn.edu/directory/azv0019), and [Steven E. Rigdon](https://www.slu.edu/public-health-social-justice/faculty/rigdon-steven.php). ] --- count: false # Safety-Driven Truck Routing <img src="data:image/png;base64,#figs/ksp_diagram.png" alt="The five step process of our proposed framework. The first step involves extracting the data. In the second step, we utilize machine learning models to predict the occurence of SCEs based on traffic, weather, road, and driver data. We also construct the road network. In the third step, we use the outputs from the ML process as inputs to an optimization model, where we try to minimize the distance traveled and the occurence of SCEs. In step 4, we define the solution approach, and we rank the solutions in step 5. " width="38%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> **Source:** On-going joint work with [Qiong Hu](https://business.ucdenver.edu/about/our-people/qiong-hu), [Alex Vinel](https://eng.auburn.edu/directory/azv0019), and [Steven E. Rigdon](https://www.slu.edu/public-health-social-justice/faculty/rigdon-steven.php). ] --- count: false # Safety-Driven Truck Routing .center[The predictive performance of the machine learning models on the holdout dataset.] | Metric/Model | cart | glm | lasso | nb | nnet | rf | ridge | svm | xgb | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | AUC | 0.693 | 0.723 | 0.722 | 0.740 | 0.752 | 0.752 | 0.710 | 0.745 | **0.765** | | Accuracy | 0.662 | 0.745 | 0.736 | **0.803** | 0.711 | 0.688 | 0.711 | 0.696 | 0.700 | | Sensitivity | 0.641 | 0.548 | 0.554 | 0.534 | 0.652 | 0.679 | 0.579 | 0.647 | **0.684** | | Specificity | 0.663 | 0.747 | 0.738 | **0.805** | 0.711 | 0.688 | 0.712 | 0.696 | 0.700 | | Gmean | 0.651 | 0.640 | 0.639 | 0.656 | 0.681 | 0.683 | 0.634 | 0.671 | **0.692** | .footnote[ <html> <hr> </html> **Source:** On-going joint work with [Qiong Hu](https://business.ucdenver.edu/about/our-people/qiong-hu), [Alex Vinel](https://eng.auburn.edu/directory/azv0019), and [Steven E. Rigdon](https://www.slu.edu/public-health-social-justice/faculty/rigdon-steven.php). ] --- count: false # Safety-Driven Truck Routing <center> <video width="44%" height="44%" autoplay controls aria-label="A video showing the routes corresponding to pareto ranks 1, 2, and 3 for driving between Cincinnati, OH and Gary, IN."> <source src="data:image/png;base64,#figs/case3.mp4" type="video/mp4"> </video> </center> .footnote[ <html> <hr> </html> **Source:** On-going joint work with [Qiong Hu](https://business.ucdenver.edu/about/our-people/qiong-hu), [Alex Vinel](https://eng.auburn.edu/directory/azv0019), and [Steven E. Rigdon](https://www.slu.edu/public-health-social-justice/faculty/rigdon-steven.php). ] --- class: inverse, center, middle # Concluding Remarks --- # `\(D^3\)` Requires a Team .pull-left[ - `\(D^3\)` requires a **team** - **Meaningful data-driven research/consulting** should be: + Application-based (“business”) + Driven by grand challenges + Impactful (i.e. actionable) ] .pull-right[ <img src="data:image/png;base64,#figs/d3_team.png" alt="Data-driven decision making requires a team of information technology, domain knowledge and math/statistics" width="100%" style="display: block; margin: auto;" /> ]